Efficiently Computing Edit Distance to Dyck Language
نویسنده
چکیده
Given a string σ over alphabet Σ and a grammar G defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map σ into a valid member of G ? We investigate this basic question in this paper for DYCK(s). DYCK(s) is a fundamental context free grammar representing the language of well-balanced parentheses with s different types of parentheses and has played a pivotal role in the development of theory of context free languages. It is also known a nondeterministic version of DYCK(s) is the hardest context free grammar. Computing edit distance to DYCK(s) has numerous applications ranging from repairing semi-structured documents such as XML to memory checking, automated compiler optimization, natural language processing etc. The problem also significantly generalizes string edit distance which has seen extensive developments over the last two decades and has attracted much attention in theoretical computer science as well as in computational biology community. It is possible to develop a dynamic programming to exactly compute edit distance to DYCK(s) that runs in time cubic in the string length. Such algorithms are not scalable. In this paper we give the first near-linear time algorithm for edit distance computation to DYCK(s) that achieves a nontrivial (polylogarithmic) approximation factor. In fact, given there exists an algorithm for computing string edit distance on input of size n in α(n) time with β(n)-approximation factor, we can devise an algorithm for edit distance problem to DYCK(s) running in Õ(n + α(n)) 1 and achieving an approximation factor of O(β(n)(logOPT )). In Õ(n +α(n)) time, we get an approximation factor ofO( 1 β(n) logOPT ) (for any > 0) . Here OPT is the optimal edit distance to DYCK(s). Since the best known nearlinear time algorithm for string edit distance problem has β(n) = poly log n, we get the desired bound. Therefore, with the current state of the art, string and DYCK(s) edit distance can both be computed within poly-logarithmic approximation factor in near-linear time. This comes as a surprise since DYCK(s) is a significant generalization of string edit distance problem and their exact computations via dynamic programming show a marked difference in time complexity. Rather less surprisingly, we show that the framework for efficiently approximating edit distance to DYCK(s) can be utilized for many other languages. We illustrate this by considering various memory checking languages such as STACK, QUEUE, PQ and DEQUE which comprise of valid transcripts of stacks, queues, priority queues and double-ended queues respectively. Therefore, any language that can be recognized by these data structures, can also be repaired efficiently by our algorithm. Õ(n) = O(npoly logn) 0 ar X iv :1 31 1. 25 57 v2 [ cs .D S] 1 2 N ov 2 01 3
منابع مشابه
O ] 2 7 Ju n 20 05 Edit distance between unlabeled ordered trees
There exists a bijection between one stack sortable permutations –permutations which avoid the pattern 231– and planar trees. We define an edit distance between permutations which is coherent with the standard edit distance between trees. This one-to-one correspondence yields a polynomial algorithm for the subpermutation problem for (231) avoiding permutations. Moreover, we obtain the generatin...
متن کاملEdit distance between unlabeled ordered trees
There exists a bijection between one stack sortable permutations –permutations which avoid the pattern 231– and planar trees. We define an edit distance between permutations which is coherent with the standard edit distance between trees. This one-to-one correspondence yields a polynomial algorithm for the subpermutation problem for (231) avoiding permutations. Moreover, we obtain the generatin...
متن کاملcc sd - 0 00 05 56 9 , v er si on 1 - 2 7 Ju n 20 05 Edit distance between unlabeled ordered trees
There exists a bijection between one stack sortable permutations –permutations which avoid the pattern 231– and planar trees. We define an edit distance between permutations which is coherent with the standard edit distance between trees. This one-to-one correspondence yields a polynomial algorithm for the subpermutation problem for (231) avoiding permutations. Moreover, we obtain the generatin...
متن کاملThe Intractability of Computing the Hamming Distance
Given a string x and a language L, the Hamming distance of x to L is the minimum Hamming distance of x to any string in L. The edit distance of a string to a language is analogously defined. First, we prove that there is a language in AC such that both Hamming and edit distance to this language are hard to approximate; they cannot be approximated with factor O(n 1 3 − ), for any > 0, unless P =...
متن کاملComputing the edit distance of a regular language
The edit distance (or Levenshtein distance) between two words is the smallest number of substitutions, insertions, and deletions of symbols that can be used to transform one of the words into the other. In this paper we consider the problem of computing the edit distance of a regular language (also known as constraint system), that is, the set of words accepted by a given finite automaton. This...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1311.2557 شماره
صفحات -
تاریخ انتشار 2013